The dynamics of competitive learning: 
the role of updates and memory 
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We examine the effects of memory and different updating paradigms in a game-theoretic model 
of competitive learning, where agents are influenced in their choice of strategy by both the choices 
made by, and the consequent success rates of, their immediate neighbours. We apply parallel and 
sequential updates in all possible combinations to the two competing rules, and find, typically, that 
the phase diagram of the model consists of a disordered phase separating two ordered phases at 
coexistence. A major result is that the corresponding critical exponents belong to the generalised 
universality class of the voter model. When the two strategies are distinct but not too different, we 
find the expected linear response behaviour as a function of their difference. Finally, we look at the 
extreme situation when a superior strategy, accompanied by a short memory of earlier outcomes, 
is pitted against its inverse; interestingly, we find that a long memory of earlier outcomes can 
occasionally compensate for the choice of a globally inferior strategy. 

PACS numbers: 05.70.Jk, 87.29.lv, 87.19.Ge, 02.50.Le 
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I. INTRODUCTION 

The modelling of social behaviour is of increasing con- 
cern to statistical physicists Studies of social and 
biological systems often reveal that even when the inter- 
actions of a given individual are very localised in time 
and space, collective, regular behaviour can emerge: this 
is analogous to the cooperative behaviour manifested by 
emergent systems in the natural world. Such social reg- 
ularities may well take the form of learning, when indi- 
viduals adopt the behaviour of other individuals. From 
the perspective of game theory 0], this can be seen as 
an adoption of a particular strategy, whose result may or 
may not be associated with a favourable outcome. It is 
then quite reasonable to expect that the effectiveness of 
a strategy in yielding favourable outcomes should influ- 
ence how likely it is to persist, and spread through the 
population; the resulting ideas of strategic learning |3| 
have found wide application, starting from economics [4] 
to cognitive science [5j. 

Against the backdrop of the above ideas, a model of 
strategic learning was introduced in Q, with one of two 
possible strategies (denoted as + and — in the remainder 
of this paper) being available to each agent on a lattice: 
the agents were referred to as 'myopic' (aware only of 
their immediate neighbours) and 'memoryless' (unaware 
of their own and others' past outcomes) in the paper on 
technology diffusion 0] that inspired the above model 
0, 0] • The question on which this body of work has cen- 
tred is: despite these handicaps, can agents overall learn 
to use the superior one of two available technologies? 
Briefly, each agent changes (or does not change) strat- 
egy based on two elementary rules at every time step: a 
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majority-based rule, reflecting its tendency to align with 
its local neighbourhood, followed by a performance-based 
rule, where the agent adopts the strategy that 'wins' in 
its neighbourhood. This (relative) success is measured in 
terms of outcomes, where the probability of a successful 
outcome for strategy + (— ) is p+ (p_). Also, the model of 
[U added to the description of Q by endowing the agents 
with memory: those agents who make their choices on the 
basis of the last payoff alone, are adjudged to be memo- 
ryless (with a corresponding parameter e near 1), while 
those who allow for memories of earlier outcomes may 
make decisions that run counter to immediate evidence 
(e small). 

Some related ideas have been examined in recent work. 
For example, the issue of consensus formation in a model 
of threshold learning [8[ shows close analogies: in this 
model, the competition between the 'noisy' signals from 
the immediate neighbours of an agent (cf. the major- 
ity rule in @) and the acceptance threshold that agents 
require to change their state (cf. the memory thresh- 
old in the performance-based rule of @), determine the 
phase diagrams obtained. Recent studies of coevolving 
Glauber dynamics on networks Q are also relevant, since 
the model of [1] can be viewed as a competition between 
the Glauber dynamics of two sets of Ising spins, corre- 
sponding to strategy and outcome respectively. 

In the current paper, we take all these ideas further. 
First, we explore the effect of different updates. If new in- 
formation propagates sequentially through the network, 
and the arrow of time is discernible in the decisions of in- 
dividual agents, are the global phase diagrams any differ- 
ent from what they would be if information was transmit- 
ted and all decisions were taken simultaneously! Com- 
mon sense tells us that sequential or parallel updates 
should make a difference to the nature of the phase dia- 
gram, and the results of the present paper confirm this. 
Also (unlike the work of [(| 0] which examined the situa- 
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tion at coexistence) we look in this paper at the effects of 
disparate strategies (p + ^ The final, and possibly 

most important issue, is that of memory, which acts as a 
threshold governing change 0: what is the effect of the 
threshold e, which tells the agent that longer-term inputs 
are significant, and need to be considered when making a 
decision? We will find that, indeed, a longer memory of 
earlier outcomes can sometimes make up for the choice 
of a globally inferior strategy. 

The plan of this paper is as follows. In Section [H] we 
review the model of [6| . In Section IIII1 we discuss the 
behaviour of the model for a range of updating schemes, 
in the presence of memory. In Section IIV1 we examine 
the behaviour of the model away from coexistence, as a 
function of distinct parameter values for the two strate- 
gies; in particular we discuss here the role of memory. 
In the concluding section, we discuss our results and put 
them in the context of other recent work in the field. 



II. DEFINITION OF THE MODEL 

The model of @ involves two types of strategies, — and 
+, where the + strategy is globally superior [4[ to the — 
strategy. As mentioned above, agents tend to follow the 
strategy adopted by the majority of their neighbours, 
modifying this choice in a second step (if necessary) ac- 
cording to which of these have proved to be the most 
successful. 

Assuming that the agents sit at the nodes of a d- 
dimensional regular lattice with coordination number 
z — 2d, the efficiency of an agent at site i is represented 
by an Ising spin variable: 



m{t) 



+ 1 if i is + at time t, 
— 1 if i is — at time t. 



(1) 



The evolution dynamics of the lattice is governed by two 
rules. The first is a majority rule, which consists of the 
alignment of an agent with the local field (created by its 
nearest neighbours) acting upon it, according to: 



m(t + T i) 



Here, the local field 



w.p 



if 
if 
if 



h t (t) > 0, 
hi{t) = 0, 
hdt) < 0. 



(2) 



(3) 



is the sum of the efficiencies of the z neighbouring agents 
j of site i and t\ is the associated time step. Next, a per- 
formance rule is applied. This starts with the assignment 
of an outcome Oi (another Ising-like variable, with values 
of ±1 corresponding to success and failure respectively) 
to each site i, according to the following rules: 



if m(t) = +1, 

then <Ji{t + t-i) = 



w.p. 
w.p. 



P+ 



if Vi (t) = -i, 

then Oi{t + r 2 ) = 



-1 



w.p. 
w.p. 



p- 



(4) 



where t 2 is the associated time step and p± are the proba- 
bilities of having a successful outcome for the correspond- 
ing strategy. With and denoting the total num- 
ber of neighbours of a site i who have adopted strategies 
+ and — respectively, and 1^ denoting the num- 

ber of successful outcomes within the set JV+ (N~ ) , the 
dynamical rules for site i are: 



and < TM 



if rn(t) = +1 

then rji(t + t 3 ) = 
if = -1 

then r)i(t + T 3 ) = 
h(t) 



N+ (t) N7 (t) ' 

— 1 w.p. e + 
+1 w.p. 1 — e + , 

and £t < 
ancl N-(t) < JV+(t)' 

+ 1 W.p. £_ 

— 1 W.p. 1 — £_. 



(5) 



Here, the ratios j^m are nothing but the average pay- 
off assigned by an agent to each of the two strategies in 
its neighbourhood at time t (assuming that success yields 
a payoff of unity and failure, zero). Also, r 3 is the asso- 
ciated time step and the parameters e± are indicators 
of the memory associated with each strategy. In their 
full generality, e and p are independent variables: the 
choice of a particular strategy can be associated with ei- 
ther a short or a long memory. However, we would like 
in this paper to answer a question which was posed, but 
not answered in Q: can the presence of a good memory 
compensate for the choice of an inferior strategy? We 
therefore examine the extreme situation when a globally 
superior strategy (p_|_ 3> P-), combined with a shorter 
memory (e + 3> £-) is in competition with its inverse: 
this is the situation that will be studied in Section IIVI 
Setting the timescales 



r 2 -4-0, ti = t 3 = 1, 



(6) 



the above steps of the performance rule are recast as 
effective dynamical rules involving the efficiencies r]i(t) 
and the associated local fields alone: 



if Vi(t) = +1, 
then rji{t + 1) 

if Vi(t) = -1, 
then r]i(t + 1) 



+ 1 

— 1 w.p 



w.p. w + [hi(t)] 
l-w+[h t (t)}, 



w.p. 
w.p. 



w-[hi{t)] 
l-w4hi(t)}. 



(7) 



The effective transition probabilities w± (h) are evaluated 
by enumerating the 2 Z possible realizations of the out- 
comes o~j of the sites neighbouring site i, and weighing 
them appropriately. For a 2-d square lattice, the possi- 
ble local field values at the interfacial sites are and ±2. 
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The corresponding transition probabilities for these field 
values are @: 

w+{+2) = l-e +P -{l-pl), 
«;_(+2)= £ _(l-p_)[l-(l-p + ) 3 ], 
w+(0) = 1 - £+p_(l - p+)(2 -p- - 2p+ + 3p_p+), 
u>_(0) = e_p+(l -p_)(2 - p+ - 2p- + 3p_p+), 
w+(-2) = 1 - e+ (l - P+ )[l - (1 -p_) 3 ], 
i»-(-2) =e-p+(l-p 3 _). 

(8) 

In 0, the model was explored at coexistence with an 
ordered sequential update applied to memoryless agents 

P+=P-, e+=e- = l. (9) 

In the present paper, we go beyond this in two different 
ways. First, still at coexistence, we explore the effect 
of different updates on the p — e phase diagram of 
the model: next, we examine the model away from 
coexistence, for distinct values of p± and e±. The basic 
quantities considered hereafter are the magnetization 
M, staggered magnetization M stag and the energy E. 
These quantities are defined for a finite sample of N 
agents (or sites) and Nz/2 bonds (or links), as 



i 

ij 

2 

M stag = — 2J m if i is odd or even. (10) 

i 

In the following we shall usually consider mean values 
(M), (E) and (M stag ). 



• Sequential update: In this type of update, we check 
the condition A on the (i, j) th site, then update the 
efficiency of the site and proceed to the (i, j + l) th 
site using the updated value of the (i, j) th site. 

• Parallel update: In this type of update, we check 
the condition A on the (i, j) th site, do not up- 
date the site but instead save the update-decision 
in memory, and proceed to the next site. Once 
the whole lattice is swept, all the saved update- 
decisions are implemented 'simultaneously'. 

The choice of different updates generally corresponds to 
different physical situations: it has been shown that it 
also leads to a disparity in the convergence time of the 
systems concerned [13, El- We therefore examine all 
possible combinations for our two update rules: 

I parallel updates for both majority rule and perfor- 
mance rules (pp). 

II parallel update for majority rule and sequential up- 
date for performance rules (ps). 

III sequential updates for both majority rule and per- 
formance rules (ss). 

IV sequential update for majority rule and parallel up- 
date for performance rules (sp). 

In the following subsections, we explore the phase dy- 
namics at coexistence for each of these update rules in 
turn, for both parameters p and e. We state at the outset 
that all the updates (except for the sp update) which we 
consider, result in models which are in the general uni- 
versity class of the voter model 12]: the inverse energy 
1/E(t) is thus always proportional to the logarithm of 
time, In t. When, as in the case of the ss update, the 
value of the slope is exactly 2/7T Q , the exact universality 
class of the voter model is retrieved. 



III. THE EFFECT OF FINITE MEMORY, AND 
OF DIFFERENT UPDATES 

We begin this section with a review of the physical sig- 
nificance of updating schemes. Most generally, updates 
can be random or ordered as follows: 

• Random: Here, sites are chosen at random for the 
consecutive application of rules. 

• Ordered: Here, sites are chosen in an ordered fash- 
ion, i.e., after choosing every (i, j) th site, the 
(i, j + l) th site is selected. 

Since the sociological basis for this work was the propa- 
gation of innovation through connected societies [4j, we 
choose to deal only with ordered updates here. However, 
even ordered updates have two subclasses: parallel and 
sequential. Assume a condition A such that when an 
agent satisfies A, it changes strategy: 



A. The ss update 

This is the update that was used throughout [f|; 
however the phase behaviour of the model was there 
only explored for the parameter p, whereas here we 
extend it to the parameter e. In Figure [TJ we plot 
the inverse energy 1/E(t) in the p — e plane at time 
t = 512 for a square lattice of size N = 64 2 . This 
phase diagram shows clearly the existence of a dis- 
ordered paramagnetic phase embedded in a largely 
frozen phase elsewhere. The disordered phase exists for 
Pd(= 0.56 ± 0.01) < p < p c2 {= 0.70 ± 0.01) when 
e > 0.980. Our results agree with those of [6] for £ = 1, 
and extend them all across the rest of the p — e plane. 
We mention here that the average time required to reach 
consensus increases exponentially as p decreases in the 
frozen phase, leading to the presence of striped states 
13] at limiting values of p. Figure [T] also makes it clear 
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that the effect of increasing memory wipes out the disor- 
dered phase: this is as it should be, since the disordered 
phase is generated by the competition between the ma- 
jority and performance-based rules, which is dulled by 
increasing memory. 



120 
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FIG. 1. (color online) [ss update] Phase diagram of the model 
with an ss update. Plot of the inverse energy 1/E(t) at time 
t = 512 for a square lattice of size N = 64 2 in the p-e plane. 
The black region shows the disordered phase and the yellowish 
(light grey) region shows the frozen phase. 

Figure [2] shows snapshots of the dynamics of the model 
using a lattice of size N — 512 2 at times t = 8, t = 64, 
and t = 512 with random initial configurations and pa- 
rameter values p = 0.72 (very close to the critical point 
p C 2) and e = 1.0. The plots reveal characteristically 
voter-like [l2[ coarsening behaviour. 




FIG. 2. (color online) [ss update] Snapshots of the dynamics 
of the ss-updated model. Each plot is a portion (of size 100 2 ) 
of a square lattice of N = 256 2 for p — 0.72 and e — 1.0 
at times t = 8 (top-left), t = 64 (top-right) and t = 512 
(bottom). 

In Figure El we have plotted the inverse energy 1/E(t) 
against the natural logarithm of time In t for values of 
p around the critical point p C 2 — 0.70 ± 0.01. Each of 



the curves is obtained by averaging over 200 independent 
samples of size 256 2 . At the critical point, we obtain a 
straight line with a slope close to 2/tt n, a behaviour 
characteristic of the exact voter model [l2| that corre- 
sponds to 



E(t) 



tt/2 
TnT' 



(11) 



Similar behaviour is obtained at the other critical point 
Pd, in agreement with Q. 
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FIG. 3. (color online) [ss update] Plot of the inverse energy 
1/E(t) versus the natural logarithm of time In t for different 
values of p close to p C 2 = 0.70, with e set to 1.0. The lattice 
size N = 256 2 , and the p- values are indicated on the curves. 
The curve corresponding to p — p C 2 = 0.70 (shown in red) 
has slope 2/n approximately [see Equation [TT] . 



B. The pp update 

In this case, both environmental majority and 
performance-based rules are applied using parallel up- 
dates. As we will see, although the universality class of 
the model is qualitatively unchanged, this update results 
in the appearance of novel ordered phases compared to 
the ss update. As before, we hrst plot the phase diagram 
for all values of p and e, then show snapshots of the dy- 
namics, and finally get a more quantitative feel for the 
behaviour of key quantities as a function of p. 

Accordingly, Figure 2] (top and bottom), are plots of 
the absolute values of magnetization \M\ and staggered 
magnetization \M stag \, at time t = 512 for a lattice size 
N = 100 2 , in the p-e plane using pp updates. In these 
phase diagrams, we see clear evidence of the existence 
of two distinct frozen phases separated by a disordered 
phase. Looking along the line e = 1, disorder prevails for 
Pd < p < p c2 with p c i = 0.43±0.01 andp c2 = 0.57±0.01. 
Notice the symmetry of the two critical points about p = 
0.5: we shall have more to say about this later on. 

For p below p c \ 1 there is a frozen phase characterised 
by overall alignment of spins: we call this the parallel 
frozen phase (PFP). For p above p C 2, the frozen phase 
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FIG. 5. (color online) [pp update] Snapshots of the dynamics 
of the pp-updated model on a square lattice, for p — 0.41 
(leftmost), p = 0.50 (centre) and p = 0.59 (rightmost) at 
time t — 512, with e = 1.0. The yellow (light grey) and 
black colours represent the two strategies, while the greyish 
grid corresponds to anti-parallel arrangements of yellow (light 
grey) and black. The leftmost picture represents the PFP (see 
text), the centre one the disordered phase, and the rightmost 
one the AFP (see text). 
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FIG. 4. (color online) [pp update] Plot of the absolute value 
of the magnetization \M\ (top) and the absolute value of the 
staggered magnetization |M S (os| (bottom) at time t = 512 for 
a lattice of TV = 100 2 in the p-e plane. In the top figure, 
the yellowish (light grey) region refers to the parallel frozen 
phase (PFP), while the black region includes both the anti- 
parallel frozen phase (AFP), and the disordered region. In 
the bottom figure, the black region represents the disordered 
phase characterised by very low |M s tag|. 



that appears is characterised by an anti-parallel order- 
ing of spins: we call this the anti-parallel frozen phase 
(AFP). We mention also that in the AFP, the lattice 
may have more than one anti-parallel domain, with thin 
frustrated chains running in between them. This frus- 
tration can be attributed to the inability of the different 
domains to align with each other under periodic bound- 
ary conditions. The disturbances caused by these chains 
(in quantities such as \M\ or E) due to misalignment de- 
crease as 1/ y/~N and also appear to vanish for large times. 
Again, we notice that the phase transition disappears for 
low e; in fact, at very low values of e the evolving lattice 
may get trapped into striped states [l3[ at long times. 

This can be understood as follows: the effect of a long 
memory (e small) strongly reduces the relative impact of 
the performance-based rule. Depending on the value of 



e, the performance rule may not be effective for several 
timesteps whereas the majority rule is implemented at 
every timestep. In the limit of vanishing e, then, only the 
(zero-temperature) majority rule will be effective, leading 
to stripe formation as predicted by [l3| for this situation. 

Figure [5] comprises snapshots of the dynamics of the 
model for a 2c? square lattice of size N = 512 2 and at time 
t = 256, with random initial configurations. The plots 
show a portion of size 100 2 of the square lattice for three 
values of p: p = 0.41 (near the critical point p c \ between 
the PFP and the paramagnetic phase), p = 0.50 (within 
the paramagnetic phase) and p = 0.59 (near the critical 
point p C 2 separating the paramagnetic phase from the 
AFP), with e = 1. The snapshot at p — 0.41 shows the 
lattice evolving towards consensus (parallel alignment) 
with the formation of domains of one type only. The 
snapshot at p — 0.50 shows the lattice in its disordered 
phase, while the one at p = 0.59 shows that the nature 
of the lattice ordering is anti-parallel. 

To investigate this more quantitatively, we plot the ab- 
solute value of magnetization |M|, the absolute value of 
staggered magnetization \M stag \ and energy E(t) against 
p, with e — 1.0, in Figure [6] These measurements were 
recorded using a square lattice of size N = 80 2 at time 
t = 10 6 . All the curves are averaged over 100 indepen- 
dent samples for each value of p. In the region p < p c \ , 
the values of magnetization \M\ and staggered magne- 
tization |M stag | are both equal to unity at saturation, 
implying a parallel alignment of the sites; whereas for p 
above p C 2, the magnetization \M\ is zero and the stag- 
gered magnetization |M staff | equals unity at saturation, 
indicating an anti-parallel alignment of the sites. The 
energy graph is consistent with this interpretation, given 
the definition of the energy in Equation 1101 zero in the 
PFP, middling in the paramagnetic phase and unity in 
the AFP. 

In order to confirm the voter-like nature of the criti- 
cal points, we plot the inverse energy 1/E(t) against the 
natural logarithm of time In i, choosing p values near 
both critical points (see Figure [7] and Figure [5]) . Each 
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FIG. 6. (color online) [pp update] Plot of the absolute value 
of magnetization \M\, the absolute value of staggered mag- 
netization \M s tag\ and energy E against p, with e — 1 and 
lattice size N = 80 2 . Each curve is drawn using symbols (and 
colour) as indicated in the legend. 



curve is an average over 200 independent samples. Ex- 
actly at the critical points p c \ — 0.43 and p C 2 = 0.57, 
a linear behaviour of inverse energy with respect to In 
t is found, with slopes of 1/2tt and —l/5n respectively. 
While the critical exponents are those of the voter model 
[13], the values of the slope are different from 2/n: we 
find therefore that the pp update of the model belongs to 
the universality class of the generalised, rather than the 
exact, voter model [l2j. 
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FIG. 7. (color online) [pp update] Plot of the inverse energy 
1/E(t) versus In t for different values of p close to p c i = 0.43 
and e = 1, for a lattice of size N = 100 2 . The p- values are 
indicated on the curves. The straight line corresponding to 
Pci = 0.43 (shown in red) has slope 1/2-7T approximately. 

To conclude this subsection: the main effect of the pp 
update is to change the nature of the ordering in one of 
the two frozen phases, so that anti-parallel ordering is 
found in the high-p frozen phase. As before, the effect 
of increasing memory (going to low e) is to smear out 
the phase transitions to the disordered phase, by under- 
mining the effect of the outcome-based rule whose com- 



petition with the majority rule causes the appearance of 
disorder. Such instances of mixed domains have been 
found in recent work on coevolving (parallel) dynamics 
; some features of these results also appear in studies of 
threshold dynamics of societal systems [8[ . For a real- life 
example of the AFP in the case of technology diffusion, 
we cite the results of [3] where the authors conclude 
that "in technology clusters where direct competitors are 
right next door, leading firms generate innovations that 
are technologically very distant from their neighbours" 
.14]. 




FIG. 8. (color online) [pp update] Plot of the inverse energy 
1/E(t) versus In t for different values of p close to p C 2 = 0.57, 
with e = 1, for a lattice of size N = 100 2 . The p- values are 
indicated on the curves. The straight line corresponding to 
p C 2 = 0.57 (shown in red) has slope — 1/57T approximately. 



C. The ps update 

The behaviour of the ps-updated model is qualitatively 
similar to that of the pp-updated model above. Again, 
there are two frozen phases PFP and AFP, separated by a 
disordered phase: the values of the critical points p c ± and 
Pci are however shifted, such that the disordered region 
extends between p c \ = 0.31 ± 0.01 and p C 2 = 0.69 ± 0.01 
at £ = 1.0. We find once again that the two critical points 
are symmetrically placed with respect to p = 0.5, as in 
the pp update: we will give an argument for why this is 
so, in the following subsection. 

To avoid repetition, we present only the phase diagram 
for the staggered magnetisation as a function of p and e: 
Figure [9] shows the absolute value of the staggered mag- 
netization |M staff | of the system at time t = 512 for a 
square lattice of size TV = 100 2 . The paramagnetic re- 
gion, with low values of |M stas | is coloured black in the 
figure, whereas the frozen regions (containing either par- 
allel or anti-parallel ordering) with high values of |M stag |, 
are coloured yellow (light grey) . These phases are investi- 
gated more quantitatively in Figure fTTil where we plot the 
absolute value of magnetization |M|, energy E and the 
absolute value of staggered magnetization |M s t og | against 
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FIG. 9. (color online) [ps update] Phase diagram in the p-e 
plane of the ps-updated model, with a plot of the absolute 
value of staggered magnetization |M s t a9 | at time t = 512, for 
a lattice size of N — 100 2 . The black region represents the 
disordered phase (very low |M stag |), while the yellowish (light 
grey) region represents frozen phases with high |M stag |. 



2 



11 


11 / ' L 1111 

/0.29 




/ /0-30 [ 




/ / 0.31 




^^^J 032 


^ U - JJ 

,],,,, 1 ,,, , 



1 2 3 4 5 6 7 



In (t) 

FIG. 11. (color online) [ps update] Plot of the inverse energy 
1/E(t) versus In t for different values of p close to p c i = 0.31, 
with e set to 1.0 for a lattice size N — 256 2 . The p-values are 
indicated on the curves. The straight line corresponding to 
Pd = 0.31 (shown in red) has a slope of approximately 4/37T. 



p, with e equal to 1.0; each curve is an average over 100 
independent runs. The region where both the magneti- 
zation \M\ and staggered magnetization |M stag | curves 
saturate to 1, corresponds to parallel alignment, whereas 
\M\ k, with |M stag | 1 implies an anti-parallel align- 
ment of the spin types. The energy graph is consistent 
with this interpretation, given the definition of the en- 
ergy in Equation 1101 zero in the PFP, middling in the 
par; 
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FIG. 10. (color online) [ps update] Plot of the absolute value 
of magnetization \M\, energy E and the absolute value of 
staggered magnetization \M s tag | against p with e set to 1.0 for 
a lattice size TV = 100 2 . Each curve is drawn using symbols 
(and colour) as indicated in the legend. 

Finally, we present the variation of inverse energy with 
the natural logarithm of time, In t, near the critical points 
Pd and p C 2 in Figure [TT1 and Figure[T2]respectively. Each 
of the curves is an average over 200 independent runs. 
At criticality, both plots show a linear proportionality 



between 1/E(t) and In t, with slopes of 4/37T and — 4/157T 
at pd — 0.31 and p C 2 — 0.69 respectively. Again, this 
indicates that the ps update of the model belongs to the 
generalised, rather than the exact, universality class of 
the voter model 

To conclude, the ps update yields qualitatively simi- 
lar results to the pp update, with the appearance of two 
frozen phases PFP and AFP. Again, small values of e 
indicating longer memories of outcomes, lead to a smear- 
ing out of the phase transition, because of the decreasing 
effectiveness of the outcome-based rule. 




ln(t) 

FIG. 12. (color online) [ps update] Plot of the inverse energy 
1/E(t) versus In t for different values of p close to p C 2 = 0.69, 
with e set to 1.0, for lattice size TV = 256 2 . The p- values are 
indicated on the curves. The straight line corresponding to 
p C 2 = 0.69 (shown in red) has slope — 4/157T approximately. 
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D. Explanation for the nature of the phase 
diagrams for different updates 

In this subsection, we give arguments for the three 
most important features of the phase diagrams presented 
above: 

(i) The appearance of anti-parallel ordering in both pp 
and ps updates 

(ii) The symmetry of the PFP and the AFP phases in 
both pp and ps updates 

(hi) The positioning of the disordered phase in ss, pp 
and ps updates 

The clue which explains all of the above, is the for- 
mation of 'active' or disparate bonds by the rules of the 
model under different updates: these are clearly the units 
of anti-parallel ordering. Consider thus configurations 
where a site is surrounded by a majority of its own kind: 
this would correspond to a local field of +2 for a +, and 
—2 for a — . Here the majority of the bonds are 'like' or 
'inactive'. The transition probability for the increase of 
active bonds from such configurations is 1 — w+(+2) (or 
1— w-(— 2)) [see Equation[8] . The transition probabilities 
for the decrease of active bonds are given by an opposite 
scenario, yielding w-(+2) (or w + (— 2)) [see Equation[8]. 
We plot two of these transition probabilities in Figure IT31 
corresponding respectively to an increase and a decrease 
of active bonds: the former peaks at p = 0.63 while the 
latter peaks at p = 0.37. 



0.5 
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FIG. 13. (color online) Transition probabilities (1 — w+(+2)) 
(drawn as solid line (in green (grey))) and w~(+2) (drawn as 
dashed line (in black)) against p [from Equation [8] . 

The net probability of having active bonds is the dif- 
ference between these two transition probabilities, and is 
plotted in Figure [14] We see from this that the proba- 
bility of having active bonds is greatest at p = 0.79, and 
least at p = 0.21. The last ingredient that we need to 
explain the AFP phase in the pp and ps updates is the 
fact that once clusters with many active bonds, i.e. anti- 
parallel ordering, are formed, the majority rule applied 



via the parallel update preserves such ordering. With 
all this in place we see that as expected, the AFP phase 
in both pp and ps updates shows up in qualitatively the 
same regions as predicted by Figure with a peak, in 
both cases at around p — 0.79. Correspondingly, the 
PFP in both pp and ps updates shows up in the region 
predicted in this figure, with a peak in both cases at 
around p = 0.21. Notice (Figured!]) that the peak and 
the dip in the probability of active bonds are symmetric 
about p = 0.5, thus explaining the symmetry that we 
have observed in Figure |4] and Figure [9] p — 0.5 is thus 
the natural point for the appearance of the disordered 
phase in both pp and ps updates, as will be confirmed by 
an inspection of Figure E] Figure [9] and Figure [T4l 




FIG. 14. (color online) The difference in the transition prob- 
abilities (1 — w+(+2)) and w-(+2) is plotted against p [see 
Equation [8] . 

The only remaining point to be explained is the ap- 
pearance of the disordered phase in the ss update. In 
this case too, the analysis leading to Figure [TH for the 
probabilities of having active bonds remains valid. How- 
ever, the sequential update of the majority rule always 
favours strictly parallel ordering, so that typically clus- 
ters of active bonds are destroyed once formed. When 
the probability of their formation is strongest, i.e. at 
p = 0.63 (see Figure [15]) . the competition between the 
majority and outcome-based rules is at its most intense, 
and a disordered phase may be expected to appear. In- 
deed, the mid-point of the disordered phase for the ss up- 
date is shown in Figure [T] to be in exact agreement with 
this predicted peak, given as it is by [p c \ +p C 2)/2 = 0.63. 

E. The sp update 

In the case of this update, the phase diagram, Fig- 
ure [15] shows nothing but a frozen phase. As is evident 
from the plot of inverse energy 1/E(t) versus In t (Fig- 
ure [16]), there is a continuous increase in 1/E(t) for all 
values of p at e = 1.0 (where the phase transition is ex- 
pected to be the most visible) . This suggests that the two 
rules, majority and performance-based, do not compete 
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with each other at all (this is what had led to the appear- 
ance of the disordered phase in all the other updates). We 
suggest that this might be because the sequential update 
(with its more immediate conversions) in the case of the 
majority rule completely dominates the slower parallel 
update for the outcome-based rule: this in turn leads to 
an increasing tendency for consensus, independent of the 
value of p, with which our results are consistent. 
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the model at coexistence as carried out in this paper as 
well as in earlier work [(| 0] was aimed at an understand- 
ing of its phase diagram. However, in the exploration 
of the behaviour of the model away from coexistence, we 
hope to gain an understanding of the relative importance 
of parameters such as superiority of strategy (modelled 
by p) and memory (modelled by e), when these are in 
competition. The behaviour in asymmetric conditions 
(using p + > p- and e + > £_) is formulated in terms of 
the application of two biasing 'fields' @ 



H =p+-p- 



B = e+ 



(12) 



such that one strategy is favoured over the other. 

In the following subsection, we look at a linear re- 
sponse formulation of our question in terms of unequal 
p's, viewed as a biasing field, keeping e the same for 
both strategies. In the final subsection, we look at un- 
equal strategies as well as unequal memories, to find out 
whether inferior strategies applied with a good memory 
of past outcomes, can win overall. 



Linear response theory: strategies with unequal 

P 



FIG. 15. (color online) [sp update] Phase diagram of the sp- 
updated model. Plot of the absolute value of magnetization 
\M\ at time f = 512 for lattice size N = 64 2 in the p-e plane. 
No phase transition is visible. 
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FIG. 16. (color online) [sp update] Plot of the inverse energy 
1/E(t) versus In (t) for different values of p and e — 1, for a 
lattice size N — 256 2 . The value of p for each curve is given by 
a different colour as indicated. No phase transition is visible. 



IV. AWAY FROM COEXISTENCE: WHEN THE 
STRATEGIES ARE DISTINCT 

Evidently, the real use of a competitive learning model 
such as this one is when the agents have a choice of dis- 
tinct strategies. The full exploration of the behaviour of 



Linear response theory is premised on the basis that 
an order parameter such as the magnetisation undergoes 
a sharp change in the neighbourhood of a critical point. 
In both the ss and pp updates of this model, there are 
two critical points p c \ and p C 2 separating a paramagnetic 
phase from two frozen phases. In this subsection, we 
look at the linear response behaviour of the model in the 
vicinity of both critical points, starting from the disor- 
dered phase: clearly the response will depend both on 
the value of p as well as on the value of the biasing field 
H (defined in terms of the difference of the p's in Equa- 
tion Q7J]). In the following, we examine the response by 
choosing a given value of p, and writing p± = p ± H/2, 
keeping e fixed. 

We first consider the ss-updated model. Figure [TT] 
shows a plot for magnetization M against the biasing 
field H at various values of p, that are within the para- 
magnetic phase at s = 1.0. Each curve is obtained after 
averaging over 100 initial configurations using a square 
lattice of size N = 100 2 . For each p in the paramagnetic 
phase, we see a linear behaviour of M against H around 
H ~ 0, with all subsequent increases in the field strength 
leading to saturation, as expected. For a given p value 
we observe a functional dependence of the form 

M = tanh(WT) 

where 

b (X (pcentral + pf 

taking 

Pcentral = {Pel + Pcl)l^- 
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FIG. 17. (color online) [ss update] Plot of magnetization M 
against biasing field H for different values of p. Each curve 
is drawn using different symbols (and colour) as shown in the 
legend, at time t = 2000 with TV = 100 2 and e = 1.0. 



The quality of the fit to tanh(WJ ) is seen Figure [THJ the 
black fitting curve almost completely coincides with a 
sample curve taken from Figure 1171 




FIG. 18. (color online) [ss update] Plot of magnetization M 
against field H for p = 0.58 and e = 1.0 at time t = 2000 with 
TV = 100 2 . The fit of a tanh(btf) curve almost completely 
overlaps with our numerical results. 

These results also admit of an alternative representa- 
tion, shown in Figure [T9l where it is clear that the rela- 
tive values of the bias correspond to different regions of 
domination of each strategy in phase space. 

We next examine the linear response behaviour of the 
pp-updated model, again in the vicinity of the two crit- 
ical points. Figure [20] is a plot showing the variation in 
magnetization M along the field H for different p values 
at e = 1.0. For the lower values of p, in the vicinity of 
Pd, we see very similar behaviour to that presented in 
Figure [TTj corresponding to an expected tanh(67J) be- 
haviour as shown in Figure [THJ the PFP phase lying to 
the left of pd is, after all, identical to the frozen phases 
in the ss update. As we approach the vicinity of p C 2, the 



curves are markedly different: the nature of the ordered 
phase is one that corresponds to magnetisation values 
of (see orange curve drawn using plus symbols in Fig- 
ure HOI) > which is again consistent with the AFP phase 
that lies to the right of p C 2 ■ 
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FIG. 19. (color online) [ss update] Plot of magnetization M 
in the p+-p~ plane at time t — 7000, for a lattice of size 
TV = 100 2 with e = 1.0. The + strategy dominates in the 
yellow (light grey) region, while the — strategy dominates in 
the brown (black) region. 

To establish this more firmly we look at plots of the 
absolute value of the staggered magnetisation |M sta£ ,| as 
a function of bias H, in Figure 1211 The green (trian- 
gle), blue (square) and black (plus) lines denote increas- 
ing values of p < p C 2, where the staggered magnetisation 
increasingly approaches zero, as expected in the disor- 
dered phase: however the red (star) line, corresponding 
top > p C 2 shows an abrupt jump in the value of \M stag | to 
unity. Combined with the analysis of the previous para- 
graph, this shows convincingly that the phase we refer to 
as AFP indeed corresponds to anti-parallel ordering. 

We present below an alternative representation of the 
above results for ease of visualisation. In Figure [521 the 
magnetisation M is plotted in the p+-p_ plane: as be- 
fore, the regions of brown (black) (resp. yellow (light 
grey)) correspond to domination by — strategies (resp. 
+ strategies). Notice, however, that the coexistence line 
has an island of very low magnetisation: in actual fact, 
this corresponds to the regions of both the paramagnetic 
and AFP phase. This is clearer in the plot of the abso- 



lute value of the staggered magnetisation |M stag |, shown 
in Figure [23] where the black portion of the island along 
the coexistence line corresponds to the disordered phase, 
while the faintly brown (grey) portion corresponds to the 
AFP. 

These plots allow us to go beyond the previous analysis 
in defining the domain of stability of the AFP phase: we 
see clearly from Figure [22] and Figure [23] that the AFP 
phase exists for p > p C 2 only if the biasing field is within 
the bounds defined by H = \p + - p_ \ < 0.19 ± 0.02. In 
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FIG. 20. (color online) [pp update] Plot of magnetization M 
against field H for different values of p, each curve indicated 
by a different symbol (and colour) as shown in the legend, at 
time t = 2000 with N = 100 2 and e = 1.0. 




FIG. 21. (color online) [pp update] Plot of the absolute value 
of staggered magnetization [M s tag] against field H for differ- 
ent values of p, each curve indicated by a different symbol 
(and colour) as shown in the legend, at time t = 2000 with 
N = 64 2 and e = 1.0. 
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FIG. 22. (color online) [pp update] Plot of magnetization M 
in the p+-p~ plane for a lattice of size N = 100 2 at time 
t — 7000, with e = 1.0. As before, the regions of + and 
— strategy domination are coloured yellow (light grey) and 
brown (black); the orange (grey) region corresponds to both 
the paramagnetic and the AFP region (see text). 
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qualitative terms, this implies that at least in the absence 
of memory, when the two strategies have nearly equal 
success rates, neighbouring agents may adopt different 
strategies fl4| in equilibrium. 

Having thoroughly investigated the linear response 
regime for the ss- and pp-updated models, we will now 
examine the effect of the memory parameter e in the next 
subsection. 



B. Role of memory parameters: the case of 
unequal e 

The principal competition in this model is that be- 
tween two strategies with different global success rates 
p, which determines the relative dominance of each one 
in phase space. The memory parameter e plays a more 
subtle role in this competition: although it cannot be a 



FIG. 23. (color online) [pp update] Plot of staggered mag- 
netization |Mst a g| in the p+-p- plane, for a lattice of size 
N = 100 2 at time t = 7000, with e = 1.0. Here, yellow 
(light grey) represents the region of parallel ordering, black 
represents the disordered phase, and the light brown (grey) 
represents AFP order. 



determinant of phase behaviour in the way that the suc- 
cess rates are (as a consequence of the rules elucidated in 
Equation [5]) , it can, as we will show, cause a surprising 
change in the dominance of an ostensibly superior strat- 
egy. In Q , it had been suggested that agents with inferior 
strategies and good memories might indeed win against 
agents who had better strategies but worse memories. 
Here, we will make this prediction more quantitative. 

The phase diagram of the model away from coexistence 
involves four parameters, p±,£±, so that its representa- 
tion is a non-trivial problem. In the following, we choose 
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to fix p + to 0.5, and to vary the other three parame- 
ters: a sample 3d plot is shown in Figure We analyse 
the three visible faces in detail, before remarking on the 
phase behaviour within the cube: the colour coding is 
such that green (grey) represents dominance of + strate- 
gies, blue (black) represents dominance of — strategies, 
and other colours represent mixed states. 




FIG. 24. (color online) [pp update] A 3D plot of Magnetiza- 
tion M with parameters e+ (along x), e_ (along y) and p- 
(along z), setting p+ = 0.50 for a lattice of size N = 64 2 at 
time T — 2000. Green (grey) denotes the dominance of the 
+'s, while blue (black) denotes that of the — 's. The other 
colours represent cases of intermediate ordering. 

• The leftmost face of the cube corresponds to the 
plane e + = 0; this implies that the agents using 
+ strategies will never convert, no matter what 
the outcome-based rule says. The minimum oc- 
cupancy of + strategies for random configurations 
should thus be of the order of N/2, which can only 
increase depending on the conversions of agents us- 
ing — strategies into the camp of the +'s. The bot- 
tom line corresponds to p_ = 0, which is when such 
conversions are maximal (so that all N sites arc +): 
the green (grey) colour is at its most pronounced 
here, changing gradually over to other colours only 
as e_ — > to the right of the line, when agents 
using — strategies too begin to refuse to convert, 
irrespective of the outcome rules. As the values of 
p_ increase beyond 0.5 (the fixed value for p+), we 
note that the dominance of the + strategy gradu- 
ally gives way to states with a mixture of strate- 
gies; when e_ — > 0, this tendency is at its most 
pronounced, while when e_ — > 1, this is at its least 
pronounced, since local conversions can sometimes 
go against global success rates. 

• The front face of the cube corresponds to e_ = 
0; this implies that agents using — strategies will 
never convert, no matter what the outcome-based 
rule says. This is a reflection of the previous case, 



where the minimum number of — sites is once again 
N/2, which can only be increased as the conversions 
from the +'s add to it. 

• The top face of the cube corresponds to p_ = 1, 
where globally a predominance of the — strategy is 
expected. This is found over almost all the range 
of e_ except at low values of £+, where agents us- 
ing the + strategy refuse to convert, despite their 
globally poorer performance. 
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FIG. 25. (color online) [pp update] A 3D plot of Magnetiza- 
tion M for the parameters e+ (lying between 0.7 and 0.8), e_ 
and p-, with p+ = 0.70, for a lattice of size N = 64 2 at time 
T = 2000. The red (lower black) region represents the dom- 
inance of the + strategy and the blue (upper black) region 
represents the dominance of the — strategy. The green (grey) 
region represents AFP. 

The interior of the cube can show markedly different be- 
haviour, which we illustrate via a sample slice shown in 
Figure [25] In this figure, red (lower black) and blue (up- 
per black) regions correspond to the dominance of + and 
— strategies respectively. Here we set the value of p+ to 
0.7, and look at a slice of its phase space cube, as before: 
choosing e+ to be between 0.7 and 0.8, we look at the 
dominating strategy as a function of the variables e_ and 
p_. If the memory parameters had not existed, we would 
have expected the — strategy (blue (upper black) region 
in the figure) to predominate only for p_ > 0.7; the red 
(lower black) region would have been covering the entire 
slice below this, corresponding to the dominance of the + 
strategy. However, the reality is rather different. The + 
strategy does indeed predominate for p_ < 0.7, provided 
that agents using the — strategy have imperfect memory; 
but there is a striking predominance of the — strategy 
(even for very low values of p- ) provided that the mem- 
ory of the agents employing this strategy, is much better 
than those of the other kind (e_ <C £+)■ 

A last feature to mention is the green (grey) region 
in Figure 1251 here, there is a region of alternating + 
and — ordering (AFP), corresponding to ongoing com- 
petition between the two strategies. This phenomenon 
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FIG. 26. (color online) [pp update] A 3D plot of absolute 
staggered magnetization |M sta9 | with parameters e+(along 
x), E- (along y) and p_ (along z) for p+ = 0.50 and lattice 
size N = 64 2 , at time T = 2000. The graph delineates the 
region of AFP for these parameter values. 

is most pronounced when the two strategies are equally 
successful, and are both accompanied by weak memories 
of earlier outcomes. In Figure [26l the structure of the 
full AFP is shown (by selecting phase points with low 
values of absolute magnetisation \M\ and high absolute 
staggered magnetization |M sta9 |) as a function of p- and 
e±, fixing p + = 0.5. 

V. DISCUSSION 

The work of this paper extends work done on a prob- 
lem of strategic learning [(| 0] which, although originally 
suggested by a problem on technology diffusion [J], has 
much wider ramifications (e.g., in relation to threshold 
learning dynamics 8]). 

In any agent-based modelling scheme, it is important 
to know whether agents react sequentially or collectively 
to the spread of information. Our results show that 
these issues make a quantitative as well as a qualita- 
tive difference to the results, changing not just expo- 
nents but also the entire nature of the phase diagram 
in most cases. Given that, typically, the propagation of 
technologies through well-connected societies is of inter- 
est [H, we choose ordered rather than random updates, 
and examine the response of the model of @ to all pos- 
sible combinations of sequential and parallel updating. 
From the viewpoint of theoretical physics, a major result 
is that this model is robustly in the universality class of 
the voter model for all but one of the updates. This 
strong relationship with the voter model results from the 
model of [1,0] being driven by interfacial noise alone, i.e. 
the absence of surface tension [ijj ■ 

Another major result, still to do with updates, is the 
appearance of a phase of anti-parallel ordering (AFP) in 
the high-performing limits of p, for both the pp and the 



ps updates. While the technicalities behind this are ex- 
plained in the text, we give here a more intuitive reason 
for this, from the perspective of strategic learning. The 
parallel scheme can be viewed as a more 'equilibrated' 
update than the sequential one, since it gives a chance 
for the entire lattice to be updated 'simultaneously'. It 
is then natural that in the regime that both agents are 
high-performing, they should be equally preferred: this 
lies behind the 'alternating' order inherent in the AFP 
regime. By contrast, since the sequential paradigm corre- 
sponds to a 'non-equilibrium' update, where every agent 
responds to the updated value of its neighbours, the 
above logic leads to a disordered phase where every pre- 
scription of the outcome-based rule is countermanded by 
the following majority rule. Using once again the illus- 
tration of propagating technologies [J] : when all the pop- 
ulace have equal and simultaneous access to information 
about two high-performing technologies, we will see the 
coexistence of both [14| (as predicted by the AFP phase) , 
whereas when information about each one is passed on 
sequentially, the conflicting information so obtained can 
result in sheer disorder. Finally, we mention here that our 
investigation of different updates on this game-theoretic 
model has been applied to related game-theoretic mod- 
els of cognitive learning and synaptic plasticity [HI, [l6[ , 
where updates relate to the directionality of synapses in 
a network. 

Moving away from the domain of critical behaviour 
at coexistence, we have looked at the behaviour of the 
competitive learning model when the two strategies have 
distinct attributes (this, after all, is truer to the title of 
competitive learning!). To begin with, we have examined 
the response of the model to unequal strategies p± , and 
have found in general that the smarter strategy wins (for 
equal values of the memory parameter e), as might be 
expected. An interesting feature is that the region of 
anti-parallel ordering (AFP) found earlier still persists in 
the presence of bias, provided that the difference in p is 
below a well-defined bound: in other words, when two 
distinct strategies are almost equally successful, one will 
typically find that they can coexist in society. Finally, 
we have looked at the effect of memory: we have found 
that while memory has a secondary role in determining 
the phase behaviour of the model, it has a particularly 
striking effect in turning around the results of any bias 
in p. A major result of our paper is thus that decisions 
based on a good memory of earlier outcomes can, within 
limits, compensate for the choice of inferior strategies. 
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